Search CORE

17 research outputs found

Unsupervised User Stance Detection on Twitter

Author: Aupetit Michaël
Darwish Kareem
Nakov Preslav
Stefanov Peter
Publication venue
Publication date: 21/05/2020
Field of study

We present a highly effective unsupervised framework for detecting the stance of prolific Twitter users with respect to controversial topics. In particular, we use dimensionality reduction to project users onto a low-dimensional space, followed by clustering, which allows us to find core users that are representative of the different stances. Our framework has three major advantages over pre-existing methods, which are based on supervised or semi-supervised classification. First, we do not require any prior labeling of users: instead, we create clusters, which are much easier to label manually afterwards, e.g., in a matter of seconds or minutes instead of hours. Second, there is no need for domain- or topic-level knowledge either to specify the relevant stances (labels) or to conduct the actual labeling. Third, our framework is robust in the face of data skewness, e.g., when some users or some stances have greater representation in the data. We experiment with different combinations of user similarity features, dataset sizes, dimensionality reduction methods, and clustering algorithms to ascertain the most effective and most computationally efficient combinations across three different datasets (in English and Turkish). We further verified our results on additional tweet sets covering six different controversial topics. Our best combination in terms of effectiveness and efficiency uses retweeted accounts as features, UMAP for dimensionality reduction, and Mean Shift for clustering, and yields a small number of high-quality user clusters, typically just 2--3, with more than 98\% purity. The resulting user clusters can be used to train downstream classifiers. Moreover, our framework is robust to variations in the hyper-parameter values and also with respect to random initialization

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

The future of sleep health: a data-driven revolution in sleep science and medicine.

Author: Aupetit Michaël
Fernandez-Luque Luis
Garcia-Gomez Juan M
Guan Yu
Mall Raghvendra
Palotti Joao
Perez-Pozuelo Ignacio
Taheri Shahrad
Zhai Bing
Publication venue: NPJ Digit Med
Publication date: 30/04/2020
Field of study

In recent years, there has been a significant expansion in the development and use of multi-modal sensors and technologies to monitor physical activity, sleep and circadian rhythms. These developments make accurate sleep monitoring at scale a possibility for the first time. Vast amounts of multi-sensor data are being generated with potential applications ranging from large-scale epidemiological research linking sleep patterns to disease, to wellness applications, including the sleep coaching of individuals with chronic conditions. However, in order to realise the full potential of these technologies for individuals, medicine and research, several significant challenges must be overcome. There are important outstanding questions regarding performance evaluation, as well as data storage, curation, processing, integration, modelling and interpretation. Here, we leverage expertise across neuroscience, clinical medicine, bioengineering, electrical engineering, epidemiology, computer science, mHealth and human-computer interaction to discuss the digitisation of sleep from a inter-disciplinary perspective. We introduce the state-of-the-art in sleep-monitoring technologies, and discuss the opportunities and challenges from data acquisition to the eventual application of insights in clinical and consumer settings. Further, we explore the strengths and limitations of current and emerging sensing methods with a particular focus on novel data-driven technologies, such as Artificial Intelligence

Apollo (Cambridge)

FigShare

Learning topology with the generative gaussian graph and the EM algorithm

Author: Michaël Aupetit
Publication venue
Publication date
Field of study

Given a set of points and a set of prototypes representing them, how to create a graph of the prototypes whose topology accounts for that of the points? This problem had not yet been explored in the framework of statistical learning theory. In this work, we propose a generative model based on the Delaunay graph of the prototypes and the Expectation-Maximization algorithm to learn the parameters. This work is a first step towards the construction of a topological model of a set of points grounded on statistics.

CiteSeerX

Robust topology representing networks

Author: Michaël Aupetit
Publication venue
Publication date
Field of study

Martinetz and Schulten proposed the use of a Competitive Hebbian Learning (CHL) rule to build Topology Representing Networks. From a set of units and a data distribution, a link is created between the first and second closest units to each datum, creating a graph which preserves the topology of the data set. However, one has to deal with finite data distributions generally corrupted with noise, for which CHL may be unefficient. We propose a more robust approach to create a topology representing graph, by considering the density of the data distribution.

CiteSeerX

Approximation de variétés par réseaux de neurones auto-organisés

Author: AUPETIT Michaël
HAURAT Alain
Publication venue
Publication date: 01/01/2001
Field of study

Les problèmes de discrimination, de classification, d'approximation de fonctions, de diagnostic ou de commande qui se posent notamment dans le domaine du génie industriel, peuvent se ramener à un problème d'approximation de variétés. Nous proposons une méthode d'approximation de variétés sous-jacentes à une distribution de données, basée sur une approche connexionniste auto-organisée et procédant en trois étapes : un positionnement de représentants de la distribution par des techniques de quantification vectorielle permet d'obtenir un modèle discret, un apprentissage de la topologie de cette distribution par construction de la triangulation induite de Delaunay selon un algorithme d'apprentissage compétitif donne un modèle linéaire par morceaux, et une interpolation non linéaire mène à un modèle non linéaire des variétés. Notre première contribution concerne la définition, l'étude des propriétés géométriques et la proposition d'algorithmes de recherche d'un nouveau type de voisinage "[gamma]-Observable" alliant des avantages du voisinage des k-plus-proches-voisins et du voisinage naturel, utilisable en grande dimension et en quantification vectorielle. Notre seconde contribution concerne une méthode d'interpolation basée sur des "noyaux de Voronoï" assurant la propriété d'orthogonalité nécessaire à la modélisation de variétés, avec une complexité de calcul équivalente ou plus faible que les méthodes d'interpolation existantes. Cette technique est liée au voisinage [gamma]-Observable et permet de construire différents noyaux gaussiens utilisés dans les réseaux RBFs. Les outils développés dans cette approche originale sont appliqués en approximation de fonctions pour l'identification d'un préhenseur électropneumatique, en approximation de variétés, et en discrimination et analyse de données. Il est notamment montré qu'il est intéressant d'utiliser les voisins 0.5-observables pour définir les points frontières entre classes et affecter les éléments à leur classe d'appartenance.GRENOBLE1-BU Sciences (384212103) / SudocSudocFranceF

OpenGrey Repository

Mesurer et visualiser les distorsions dans les techniques de projection continues

Author: Michaël Aupetit
Pierre Gaillard
Publication venue: 'Lavoisier'
Publication date
Field of study

Crossref